Gorder: An Efficient Method for KNN Join Processing

نویسندگان

  • Chenyi Xia
  • Hongjun Lu
  • Beng Chin Ooi
  • Jin Hu
چکیده

An important but very expensive primitive operation of high-dimensional databases is the KNearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query results than the range similarity join. Such an operation is useful for data mining and similarity search. In this paper, we propose a novel KNN-join algorithm, called the Gorder (or the G-ordering KNN) join method. Gorder is a block nested loop join method that exploits sorting, join scheduling and distance computation filtering and reduction to reduce both I/O and CPU costs. It sorts input datasets into the G-order and applied the scheduled block nested loop join on the G-ordered data. The distance computation reduction is employed to further reduce CPU cost. It is simple and yet efficient, and handles high-dimensional data efficiently. Extensive experiments on both synthetic cluster and real life datasets were conducted, and the results illustrate that Gorder is an efficient KNN-join method and outperforms existing methods by a wide margin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Approach to Continuous Queries with kNN Join Processing in Spatial Telemetric Data Warehouse

This chapter describes realization of distributed approach to continuous queries with kNN join processing in the spatial telemetric data warehouse. Due to dispersion of the developed system, new structural members were distinguished: the mobile object simulator, the kNN join processing service, and the query manager. Distributed tasks communicate using JAVA RMI methods. The kNN queries (k Neare...

متن کامل

Boundary Points Detection Using Adjacent Grid Block Selection (AGBS) kNN-Join Method

Interpretability, usability, and boundary points detection of the clustering results are of fundamental importance. The cluster boundary detection is important for many real world applications, such as geo-spatial data analysis and point-based computer graphics. In this paper, we have proposed an efficient solution for finding boundary points in multi-dimensional datasets that uses the BORDER [...

متن کامل

High-dimensional kNN joins with incremental updates

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is an especially expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this...

متن کامل

Efficient index-based KNN join processing for high-dimensional data

In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN ...

متن کامل

Efficient Processing of k Nearest Neighbor Joins using MapReduce

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004